Skip to content

Conversation

@aothms
Copy link
Collaborator

@aothms aothms commented Nov 2, 2025

Really basic.

This depends on clamav for detecting large zips (by setting limits on max extracted contents and by making encountering such limits a failure).

Also depends on filetype to extract content type from file contents.

Current check is not really interesting (maybe a specific purpose detection for zip bombs make more sense) and loading signatures in clamav takes some time (chose not to run the daemon because of mem usage). But usage of clamav could also be extended in the future.

The file contents are erased when mimetype is detected (such as PNG) from the file contents bytes. The file is not deleted because that would cause integrity errors when a file with the same name is uploaded again (apparently the file postfixes for uniqueness are based on filesystem not db)

@aothms aothms requested review from Ghesselink and rw-bsi November 2, 2025 11:56
@aothms aothms changed the title Run ClamAV and file magic detection to erase potentially harmful content IVS-605 Run ClamAV and file magic detection to erase potentially harmful content Nov 2, 2025
@rw-bsi
Copy link
Contributor

rw-bsi commented Nov 8, 2025

@aothms added missing pip and some extra output for DJ admin console
few unit and integration tests with eg. eicar file/content would be convenient
UPDATE: added 2 tests

<Link href={`${FETCH_PATH}/api/download/${row.id}`} underline="hover" onClick={evt => evt.stopPropagation()}>
{'Download file'}
</Link>
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was already to strip the file contents so that downloading is harmless https://github.com/buildingSMART/validate/pull/248/files#diff-5db999c346dafe19127402f8a9220bff8cafead114de27b3e971a88d55e4b0dcR162

But thanks for all the additions to this 👍

@rw-bsi
Copy link
Contributor

rw-bsi commented Nov 10, 2025

@aothms updated the dockerfile as it didn't work there
f51a00b

In a future release we can change it for clamdscan (with .conf file?), so scans are much faster

@rw-bsi rw-bsi merged commit 1cca6fa into development Nov 10, 2025
3 checks passed
@rw-bsi rw-bsi deleted the tfk-magic-and-clamav branch November 10, 2025 21:59
@aothms
Copy link
Collaborator Author

aothms commented Nov 12, 2025

In a future release we can change it for clamdscan (with .conf file?), so scans are much faster

Yes so far I did not because of the continuous memory consumption for having the signatures (I assume) always in memory. By loading them as part of the task we kind of naturally follow the memory consumption pattern of the other tasks. But I agree, the slowdown is hard to justify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants